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Abstract 

As  important  vectors  of  human  disease,  phlebotomine  sand  flies  are  of  global  significance  to 
human  health,  transmitting  several  emerging  and  re-emerging  infectious  diseases.  The  most 
devastating  of  the  sand  fly  transmitted  infections  are  the  leishmaniases,  causing  significant 
mortality  and  morbidity  in  both  the  Old  and  New  World.  Here  we  present  the  first  global 
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transcriptome  analysis  of  the  Old  World  vector  of  cutaneous  leishmaniasis,  Phlebotoinus papatasi 
(Scopoli)  and  compare  this  transcriptome  to  that  of  the  New  World  vector  of  visceral 
leishmaniasis,  Lutzomyia  longipalpis.  A  normalized  cDNA  library  was  constructed  using  pooled 
mRNA  from  Phlebotomus papatasiXwyds,,  pupae,  adult  males  and  females  sugar  fed,  adult 
females  blood  fed  and  fed  blood  infected  with  Leishmania  major.  A  total  of  47,615  generated 
sequences  were  cleaned  and  assembled  into  17,120  unique  transcripts.  Of  the  assembled 
sequences,  50%  (8,837  sequences)  were  classified  using  Gene  Ontology  (GO)  terms.  This 
collection  of  transcripts  is  comprehensive,  as  demonstrated  by  the  high  number  of  different  GO 
categories.  An  in  depth  analysis  has  revealed  245  sequences  with  putative  homology  to  proteins 
involved  in  blood  and  sugar  digestion,  immune  response  and  peritrophic  matrix  formation.  Twelve 
of  the  novel  genes,  including  one  trypsin,  two  peptidoglycan  recognition  proteins  (PGRP)  and  nine 
chymotrypsins  have  a  higher  expression  level  during  larval  stages.  Two  novel  chymotrypsins  and 
one  novel  PGRP  are  abundantly  expressed  upon  blood  feeding.  This  study  will  greatly  improve 
the  available  genomic  resources  for  Ph.  papatasi  and  will  provide  essential  information  for 
annotation  of  the  full  genome. 


Introduction 

Phlebotomine  sand  flies  are  important  vectors  of  human  disease  in  both  the  Old  and  the  New 
World,  transmitting  protozoan,  bacterial  and  viral  pathogens.  These  flies  are  members  of  the 
family  Psychodidae,  which  includes  a  diverse  group  of  vectors  that  vary  widely  in 
geographic  distribution,  ecology,  and  the  pathogens  they  transmit.  Sand  flies  serve  as 
vectors  for  several  established,  emerging  and  re-emerging  infectious  diseases,  the  most 
devastating  of  which  are  the  leishmaniases  with  350  million  people  at  risk  and 
approximately  two  million  new  cases  each  year  (Cunningham  2002,  Desjeux  1996).  Human 
migration,  political  instability,  and  warfare  is  expanding  Leishmania-endem\c  regions  and 
increasing  the  propensity  for  epidemics  world-wide  (Desjeux  2001).  In  spite  of  the  medical 
importance  of  leishmaniasis  it  is  classified  as  a  neglected  tropical  disease  and  phlebotomine 
sand  fly  species  remain  understudied. 

Approximately  40  different  species  of  Leishmania  are  transmitted  by  35  different  sand  fly 
species  (Ramalho-Ortigao,  Saraiva  &  Traub-Cseko  2010).  Most  vectors  belong  to  one  of 
two  genera,  Phlebotomus  wd  Lutzomyia  (Richard  P.  Lane,  Roger  W.  Crosskey  1993). 
Phlebotomus  species  are  responsible  for  transmitting  leishmaniasis  throughout  parts  of 
Africa,  southwest  Asia,  the  Middle  East,  and  the  Mediterranean  basin;  Lutzomyia  species 
are  vectors  throughout  the  Americas.  There  is  a  close  ecological  relationship  between 
Leishmania  species  and  the  vector(s)  that  transmit  that  species.  For  example,  Phlebotomus 
papatasi  only  transmits  Leishmania  major  yN\\s.re.2&  Lutzomyia  longipalpis  Irwssmd's  Le. 
infantum  (Killick- Kendrick  1999).  Although  most  vectors  are  specific  under  natural 

conditions,  some,  such  as  Lu.  longipalpis,  can  transmit  a  range  of  Leishmania  species  under 
laboratory  conditions. 

Genomics  approaches  enable  comprehensive  comparisons  between  diverse  organisms,  thus 
facilitating  a  more  complete  understanding  of  their  biology.  The  completed  genomes  of  the 
more  widely  studied  hematophagous  vectors,  the  malaria  vector  Anopheles  gambiae,  the 
yellow  fever  mosquito  Aedes  aegypti,  and  the  West  Nile  virus  vector  Culex  quinqefasciatus, 
have  already  contributed  valuable  information  relative  to  vectorial  capacity,  blood-feeding, 
insect  immune  system  modulation  and  insecticide  resistance  (Holt  et  al.  2002,  Christophides 
et  al.  2002,  Nene  et  al.  2007,  Arensburger  et  al.  2010).  Previous  genomic  studies  concerning 
sand  flies  have  focused  on  specific  questions  regarding  vector-human  interactions  (analysis 
of  protein  expression  in  salivary  glands  (Hostomska  et  al.  2009))  or  vector-parasite 
interactions  (analysis  of  midgut  expressed  proteins  (Ramalho-Ortigao  et  al.  2007));  only  one 
study  has  performed  a  global  gene  discovery  analysis  of  a  sand  fly,  Lu.  longipalpis  (Dillon 
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et  al.  2006).  Here,  we  expand  these  studies  by  characterizing  the  transcriptome  of  the  Le. 
/n^'or  vector,  Ph.  papatasi.  We  also  similarly  reanalyze  the  Lu.  Expressed 

Sequence  Tag  (EST)  dataset  (Dillon  et  al.  2006)  for  a  comparative  analysis. 

Phlebotomine  sand  flies  along  with  mosquitoes  (family  Culicidae)  are  members  of  the 
suborder  Nematocera,  but  these  two  families  are  representatives  of  distinct  infraorders 
within  the  Nematocera.  Ph.  papatasi  and  Lu.  longipalpis  exhibit  distinct  geographical 
distributions,  ecology,  and  vector  competence  specificity.  A  comparative  approach  between 
these  flies  will  accelerate  the  discovery  of  regulatory  and  biochemical  pathways  within  this 
family  as  potential  biopharmaceuticals,  vaccine  candidates,  and  targets  for  insecticide 
development.  Moreover,  comparative  analyses  between  these  and  other  available  vector  data 
sets  will  elucidate  the  pathways  that  lead  to  arthropod  blood-feeding  and  immunity  and 
inform  arthropod  phylogenetic  relationships. 

ESTs  represent  the  expressed  portion  of  mRNA  in  a  cell  obtained  through  single  pass 
sequencing  of  randomly  selected  cDNA  clones,  resulting  in  about  200-800  bp  of  sequence 
information  for  each  clone  (Mark  Blaxter  et  al.  2009).  EST  studies  can  be  used  for  gene 
discovery  in  organisms  where  sequencing  the  whole  genome  is  not  possible  (Lindlof  2003), 
or  in  addition  to  genome  information  for  more  accurate  gene  annotation.  A  phlebotomine 
sand  fly  genome  sequencing  project  was  initiated  a  few  years  ago  to  sequence  Lu. 
longipalpis  mid  Ph.  /’apafas'z  (McDowell  et  al.  2006).  The  aim  of  this  study  was  to  increase 
the  availability  of  EST  resources  for  future  sand  fly  studies,  provide  useful  information 
regarding  the  biology  of  these  important  vectors,  and  generate  essential  data  for  annotation 
of  the  newly  sequenced  phlebotomine  sand  fly  genomes  (McDowell  et  al.  2006). 

Results  and  Discussion 

Assembly 

Mate  pair  information  generated  during  the  sequencing  process  was  utilized  in  a  two  step 
assembly  process.  Eirst,  the  sequence  reads  were  assembled  with  their  mates  using  a  lower 
identity  criteria  based  on  the  assumption  that  they  are  opposite  ends  of  the  same  cDNA 
clone  and  thus  should  assemble.  Resulting  sequences  were  assembled  using  a  more  stringent 
identity  parameter  to  avoid  over  collapsing  closely  related  gene  families  into  a  single 
sequence.  There  were  47,615  sequences  initially  generated  from  the  normalized  Ph.  papatasi 
library.  Of  all  initial  Ph.  papatasi  rtads,  10,128  (21%)  failed  the  screening  and  filtering  steps 
(see  Experimental  procedures)  (Eigure  1).  The  remaining  37,4^7  sequences  were  then 
assembled  into  6,187  contigs  and  10,933  singlets  (Table  1),  representing  a  total  of  11  Mb  of 
the  Ph.  transcriptome,  average  assembled  sequence  length  was  550  bp.  Assembled 

sequences  with  a  length  greater  than  200  bp  were  deposited  in  GenBank  (JP539097- 
JP555361). 

A  total  of  3,909  (14%)  Lu.  longipalpis  sequences  failed  the  screening  and  filtering  steps. 

The  remaining  cleaned  sequences  (24,019)  were  assembled  into  5,063  contigs  and  4,963 
singlets,  average  length  of  1,041  bp,  representing  a  total  of  8  Mb  of  the  Lu.  longipalpis 
transcriptome  (Table  1).  The  differences  in  assembly  results  between  the  current  study  and 
the  previous  assembly  by  Dillon  et  al.  (Dillon  et  al.  2006)  can  be  explained  by  utilization  of 
a  different  assembly  program  (Cap3  vs.  Phrap)  as  well  as  by  the  different  assembly  strategy 
used  (Supplementary  Eigure  1)  (Huang,  Madan  1999). 

Eor  our  analyses  we  defined  a  read  as  a  sequence  that  was  cleaned  and  trimmed  before  the 
first  assembly  step.  A  mate  pair  represented  two  reads  sequenced  from  the  same  cDNA; 
these  reads  had  the  same  name  with  a  different  ending  denoting  either  the  5  ^  or  3  ^  end.  A 
mated  read  contig  referred  to  mate  pairs  that  co-assembled.  A  contig  was  either  a  mated  read 
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or  a  set  of  sequences  that  were  not  mate  pairs  but  assembled  under  the  conditions  indicated 
above.  One  contig  represented  one  gene,  unless  otherwise  specified.  Singlets  are  sequences 
that  failed  to  assemble  with  any  other  sequence  in  the  set,  including  their  own  mate  pairs. 

Similarity  to  known  proteins  and  GO  annotation 

Of  the  17,120  Ph.  /ta/tafasr  assembled  sequences,  4,286  (25%)  had  no  matches  when 
searched  against  the  NR  and  InterPro  databases  using  a  BLAST  search  (BLASTX)  with  an 
e-value  threshold  of  10"^,  making  them  potentially  unique  Ph.  /ta/ta/asr  sequences.  The 
average  length  of  assembled  sequences  with  matches  against  either  NR  or  InterPro  database 
was  between  500  and  699  bp  long  (Figure  2).  Of  the  total  assembled  sequences,  8,837  (50%) 
of  the  Ph.  papatasi  and  4,41 1  (44%)  of  the  Lu.  longipalpis  sequences  could  be  associated 
with  a  GO  term  in  at  least  one  of  the  three  main  categories  (biological  process,  molecular 
function  or  cellular  component).  The  level  of  annotation  is  smaller  than  that  for  Anopheles 
mosquito  {Anopheles  albimanus,  65%  (Martinez-Barnetche  et  al.  2012)  and  higher  than  non¬ 
model  organisms  with  little  or  no  closely  related  species  in  NCBI  NR  database  (~17-  50%) 
(Du  et  al.  2012,  Hou  et  al.  2011,  Wang  et  al.  2010,  Shen  et  al.  201 1).  The  limited  annotation 
may  be  explained  by  the  lack  of  sequences  for  closely  related  species  available  in  NR  and 
InterPro. 

The  high  number  of  different  GO  categories  suggests  that  our  cDNA  library  is  representative 
of  the  entire  organism.  The  distribution  of  GO  categories  was  highly  similar  (Figure  S2-4) 
between  the  two  sand  fly  species,  with  Ph.  papatasi 35  (0.2%)  sequences  in  four 
extra  categories  (viral  reproduction,  external  encapsulating  structure,  symbiosis  and 
neurotransmission  categories).  The  smaller  number  of  sequences  available  for  Lu. 
longipalpis,  combined  with  the  small  number  of  sequences  annotated  in  the  Ph.  papatasi 
dataset,  might  account  for  their  absence  in  the  Lu.  longipalpis  dataset.  Some  of  the 
differences  were  probably  due  to  the  original  source  of  RNA  prepared  from  the  two  species 
(the  Lu.  longipalpis  RNA  was  extracted  from  adult  females  only,  while  for  the  Ph.  papatasi 
RNA  extraction,  immature  life  stages  and  males  were  included)  see  Experimental 
procedures  and  Dillon  et  all.  (Dillon  et  al.  2006).  There  were  no  GO  categories  present  in 
Lu.  longipalpis  that  were  absent  in  the  Ph.  papatasi  dataset. 

The  highest  number  of  sequences  in  the  biological  process  category  for  both  sand  flies  were 
annotated  as  catabolic  process  (1,326,  15%  -  Ph.  papatasi;  562,  12.7%  -  Lu.  longipalpis).  In 
the  Molecular  Function  category,  the  majority  of  Ph.  papatasi  and  Lu.  longipalpis  sequences 
were  annotated  as  nucleotide  binding  (1,412  or  16%,  and  719  or  16.3%  respectively).  For 
the  Cellular  Component  category,  a  high  number  of  sequences  were  annotated  as  protein 
complex  (1,898,  21.4%;  864,  19.6%)  (Figure  3). 

Digestive  proteins 

Trypsin  and  chymotrypsin-like  serine  endoproteases  involved  in  blood  digestion  have  been 
characterized  in  mosquitoes  and  other  blood  feeding  arthropods,  and  their  expression  level 
was  associated  with  the  type  and  the  time  elapsed  since  a  blood  meal  acquisition  (Ramalho- 
Ortigao  et  al.  2007,  Muller  et  al.  1993,  Noriega,  Wells  1999,  Pitaluga  et  al.  2009,  Telleria  et 
al.  2010,  Vizioli  et  al.  2001).  Serine  proteases  are  also  involved  in  many  other  key  processes 
in  insects  including  complex  cascades  of  proteases  involved  in  immune  signaling  pathways 
(Buchon  et  al.  2009).  Serine  proteases,  like  trypsin,  are  considered  important  in  the 
interaction  of  the  sand  fly  host  with  Leishmania-,  trypsin  activities  being  modulated  in  the 
gut  of  both  Ph.  papatasi  and  Lu.  longipalpis.  Knockdown  of  a  Tate’  trypsin  in  Lu  longipalpis 
enhanced  the  survival  of  Le.  mexicana  (Sant' Anna  et  al.  2009).  Previously,  four  trypsins  and 
three  chymotrypsins  in  Ph.  papatasi  (Ramalho-Ortigao  et  al.  2007,  Ramalho-Ortigao  et  al. 
2003)  and  two  trypsins  and  five  chymotrypsins  in  Lu.  longipalpis  (TeWens.  et  al.  2010, 
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Jochim  et  al.  2008)  were  identified.  Here  we  identify  five  new  trypsin-like  sequences 
including  PpTrypSa  (JP544502)  and  PpTryp5b-e  (JP542407,  JP540627,  JP554453, 
JP544448),  that  may  represent  alleles  of  PpTrypSa  because  they  have  an  identity  over  95% 
at  the  amino  acid  level  (Figure  4A).  PpTrypSa  has  a  similarity  of  32-39%  at  the  amino  acid 
level  to  known  Ph.  papatasi  and  a  50%  identity  to  the  closest  related  sequence  in  the 

NR  database  -the  Ae.  aegyp*' trypsin.  The  newly  identified  trypsin  sequences  (PpTrp5a-e) 
clustered  closest  to  each  other  and  to  the  Ae.  trypsin  rather  than  to  any  of  the  six 

previously  identified  sand  fly  trypsins  (Figure  4B).  This  grouping  may  be  explained  by  the 
difference  in  cDNA  library  sources,  one  possibility  is  that  they  are  larval  or  pupal  specific 
genes;  previously  the  trypsin  sequences  were  identified  using  a  library  constructed  from  the 
midgut  of  adult  females  only,  while  the  library  described  here  was  constructed  using  whole 
sand  fly  bodies  at  different  life  stages  that  included  both  sexes.  The  Ae.  aegyp&' trypsin 
sequence  was  identified  from  the  entire  genome  sequence. 

Seventeen  novel  chymotrypsin-like  sequences  with  an  identity  of  29-69%  to  known  Ph. 

chymotrypsin  sequences  were  identified  in  the  current  study.  Five  putative 
chymotrypsin  sequences  PpChym4a,b,5-7  (JP546634,  JP554565,  JP551370,  JP547341, 
JP554731)  contained  all  the  domains  identified  in  other  chymotrypsins  (Ramalho-Ortigao  et 
al.  2007,  Ramalho-Ortigao  et  al.  2003,  Appel  1986,  Park,  Kwak  2008)  except  for  two  amino 
acid  differences  in  an  otherwise  highly  conserved  region.  The  first  substitution,  an  S  to  F,  is 
present  only  in  PpChym4b  (position  292)  while  the  second  substitution,  a  P  to  A,  was 
present  in  all  five  sequences,  PpChym4a,b-7  (position  293)  (Figure  5A) .  Four  sequences 
representing  two  putative  different  chymotrypsins  (PpChyml4a  and  PpChyml3; 
PpChyml4b  and  PpChyml4c  are  likely  alleles  of  PpChyml4a)  cluster  close  to  the 
previously  identified  PpChym2  (Ramalho-Ortigao  et  al.  2003),  to  which  they  share  a  66,  67 
and  69%  identity  at  amino  acid  level.  PpChyml  1  (including  the  2  likely  alleles:  Ppchyml  la 
and  PpChyml  lb)  and  PpChyml2  cluster  closer  to  midgut  chymotrypsin  from  Glossina 
morsitans  morstans  (Alves  Silva  et  al.  2010).  The  last  five  of  the  novel  putative 
chymotrypsin  sequences  (four  of  which  are  likely  alleles  of  PpChym8  and  PpChym9)  cluster 
closer  to  chymotrypsin-like  serine  protease  from  Chjronomus  n'panus  expressed  in  the 
larvae  gut  (Park,  Kwak  2008).  The  closer  clustering  of  novel  Ph.  papatasi  chymoXrypsm 
sequences  to  sequences  from  different  species  could  be  explained  by  a  possible  difference  in 
expression  location  (tissue)  and/or  life  stage  between  the  known  and  the  novel  chymotrypsin 
sequences;  the  known  chymotrypsin  sequences  having  been  described  in  adult  sand  flies 
(Ramalho-Ortigao  et  al.  2007,  Ramalho-Ortigao  et  al.  2003)  (Figure  5B). 

Aminopeptidases  catalyze  the  removal  of  amino  acids  from  the  N-terminus  of  peptides  and 
proteins.  Their  expression,  along  with  the  expression  of  exopeptidases,  was  shown  to  be 
modified  by  the  ingestion  of  infected  blood  meal  in  sand  flies  (Muller  et  al.  1993,  Dillon, 
Lane  1993).  Our  analysis  identified  45  Ph.  papatasi  secpaences  with  high  identity  to  different 
aminopeptidases,  including  three  aminopeptidase  P  sequences.  Of  these  three  sequences, 
only  one  (PpAPPl-  JP547392)  had  all  the  necessary  conserved  domains  (Kulkarni, 
Deobagkar  2002),  with  a  73%  amino  acid  identity  to  Ae.  aegypti proVme  specific 
aminopeptidase  (e-value  10"'^^).  The  other  two  sequences  have  a  high  identity  to  PpAPPl 
(JP539747,  JP552630)  (over  98%  identity  spanning  100-144  amino  acids)  but  lack  one  or 
more  conserved  domains.  There  were  no  aminopeptidase  P  sequences  identified  in  Lu. 
longipalpis.  This  difference  between  the  two  species  is  more  likely  due  to  the  nature  of  the 
datasets  rather  than  the  loss  of  the  gene  from  Lu.  longipalpis.  We  also  have  identified  four 
sequences  (JP41850,  JP543532,  JP547607,  JP546724)  with  similarity  to  leucyl 
aminopeptidase,  two  of  which  are  unassembled  mate  reads  and  all  four  of  them  have  a 
higher  than  97%  identity  at  the  amino  acid  level  making  them  potentially  alleles  of  the  same 
sequence,  PpAPLl  (JP541859).  Four  sequences  with  similarity  to  leucyl  aminopeptidase 
were  identified  in  Lu.  longipalpis,  three  of  which  are  probably  allelic  variants  of  LlAPLla 
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which  has  32%  identity  at  the  amino  acid  level  to  the  leucyl  aminopeptidase  identified  by 
Dillon  et  al.  (Dillon  et  al.  2006)  (Figure  6).  It  was  not  possible  to  confirm  aminopeptidase 
identity  (using  conserved  domain  identification)  in  the  remaining  39  Ph.  papatasi  sequences 
with  homology  to  known  aminopeptidases  due  to  the  short  sequence  length  (76-585  AA). 
These  Ph.  papatasi  sequences  are  likely  to  represent  aminopeptidase  transcript  fragments 
rather  than  whole  sequences  (Supplemental  Table2). 

Carboxypeptidases  also  are  enzymes  integral  to  protein  digestion  in  insects  (e.g.  mosquitoes) 
(Edwards  et  al.  2000).  In  sand  flies,  five  sequences  have  been  isolated  from  midgut,  two 
carboxypeptidase  B  (one  from  each  species)  and  three  carboxypeptidase  A  (one  from  Ph. 
papatasi  wA  two  from  Lu.  longipalpis)  (Ramalho-Ortigao  et  al.  2007,  Pitaluga  et  al.  2009). 
Eighteen  sequences  (Supplemental  Table2)  related  to  carboxypeptidase  A  were  identified  in 
Ph.  papatasi,  these  most  likely  represent  transcript  fragments.  An  additional  two  Ph.  papatasi 
sequences  (JP552630,  JP546271)  with  similarity  to  known  carboxypeptidase  B  were 
identified  in  this  study.  The  new  putative  carboxypeptidase  B  sequences  share  a  38%  and 
54%  identity,  respectively,  to  known  sand  fly  carboxypeptidases  (Eigure  7).  While  blood 
meals  are  a  preferred  source  of  protein  for  sand  flies,  they  can  survive  on  a  sugar  diet. 
Glucosidases  are  involved  in  carbohydrate  digestion  and  their  expression  was  shown  to  be 
modified  by  both  sugar  and  blood  meals  in  Ph.  langeroni  (D\\\on,  El  Kordy  1997).  Here  we 
have  identified  23  unique  Ph.  sequences  with  high  identity  (BLASTP,  <le-50)  to 

known  glucosidases  from  mosquitoes. 

Chitin  is  an  insoluble  polysaccharide  present  in  the  insect  cuticle,  peritrophic  matrix  and  the 
lining  of  the  foregut,  hindgut  and  trachea  (Zhu  et  al.  2008).  Chitinolytic  enzymes  are 
important  for  the  periodical  rearrangement  and  degradation  of  the  exoskeleton  and 
peritrophic  matrix.  To  date,  chitinases  have  been  identified  in  many  different  arthropod 
species  including  fruit  flies  [Dr.  melanogaster)  (Zhu  et  al.  2004),  mosquitoes  [Ae.  aegypti, 
An.  gambiae)  (De  la  Vega  et  al.  1998),  silk  moths  (Koga  et  al.  1997),  red  flour  beetles 
[Tribolium  castaneum)  (Zhu  et  al.  2008)  and  sand  flies  [Ph.  papatasi  wA  Lu.  longipalpis) 
(Ramalho-Ortigao,  Traub-Cseko  2003,  Ramalho-Ortigao  et  al.  2005).  The  two  known  sand 
fly  chitinases  have  been  identified  in  the  midgut  and  implicated  in  Leishmania  -  sand  fly 
interactions  (Ramalho-Ortigao,  Traub-Cseko  2003,  Ramalho-Ortigao  et  al.  2005).  Seven 
more  sequences  with  identity  to  the  previously  identified  chitinase  from  Ph.  papatasi  [51  %) 
and  from  related  organisms,  including  Ae.  aegypti  wA  Dr.  (Ramalho-Ortigao 

et  al.  2005),  were  identified  in  Ph.  papatasi  in  the  current  study.  However  the  search  for 
conserved  domains  (Zhu  et  al.  2008)  has  revealed  these  sequences  to  be  more  likely 
transcript  fragments  rather  than  full-length  chitinase  sequences.  Eight  Ph.  papatasi 
sequences  related  to  amylase  were  identified  in  this  study.  Thirty-one  sequences  with  high 
identity  (BLASTP,  <le-50)  to  lipase  also  were  identified  in  Ph.  papatasi  [Swp'p  Table2). 

Immune  response  Proteins 

Insect  immune  responses  have  been  most  extensively  studied  in  other  dipteran  insects 
including  in  Dr.  melanogaster  dnnW&ngsA  with  bacteria  or  fungi  (Lemaitre,  Hoffmann  2007) 
and  in  mosquitoes  challenged  with  /yasmor/yu/n  parasites  (Cirimotich  et  al.  2010,  Richman 
et  al.  1997,  Tahar  et  al.  2002)  or  bacteria  (Dimopoulos  et  al.  1997,  Hillyer,  Schmidt  & 
Christensen  2004).  There  have  been  only  a  few  studies  regarding  the  immune  response  of 
sand  flies  to  parasite  infection  (Ramalho-Ortigao  et  al.  2007,  Pitaluga  et  al.  2009,  Jochim  et 
al.  2008).  Here  we  have  identified  Ph.  />a/>afasy  sequences  belonging  to  three  of  the  four 
immune  response  pathways  (Cirimotich  et  al.  2010,  Tanaka  et  al.  2008),  ranging  from 
recognition  receptors  to  effector  proteins.  There  were  no  significant  matches  found  to 
components  of  the  JAK/STAT  pathway  in  either  sand  fly. 
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Recognition  proteins 

Peptidoglycan  recognition  proteins  (PGRPs)  are  highly  conserved  (Liu  et  al.  2001)  across 
species  that  bind  peptidoglycan  present  in  the  cell  walls  of  bacteria.  To  date  PGRPs  have 
been  identified  in  several  arthropod  species,  including  fruit  flies  (Werner  et  al.  2000),  moths 
(Kang  et  al.  1998),  silkworms  (Tanaka  et  al.  2008,  Ochiai,  Ashida  1999),  mosquitoes 
(Christophides  et  al.  2002)  and  sand  flies  (Pitaluga  et  al.  2009).  Two  new  PGRP  sequences 
were  identified  in  Ph.  papatasi,  one  is  a  short  singlet  with  high  identify  to  other  sequence 
(PpPGRP2  -JP540873).  PpPGRP2  has  44%  identity  to  Ae.  aegyptiVGRP -l^C  and  a  41% 
and  44%  identity  to  known  sand  fly  PGRPs  {Lu.  longipalpis  and  Ph.  papatasi,  respectively 
(Jochim  et  al.  2008)).  Three  additional  sequences  were  identified  with  high  similarity  to 
PGRPs  short  class  (PGRP-SCl-3,  JP551327,  JP547206,  JP539467)  (Figure  8). 

P-glucan  recognition  proteins  (pGRPs)  are  involved  in  recognition  of  Gram-negative 
bacterial  and  fungal  cell  walls  and  in  triggering  the  prophenoloxidase  (PPG)  cascade 
(Yoshida,  Ochiai  &  Ashida  1986,  Ochiai,  Ashida  2000).  The  pGRP  class  is  composed  of 
two  functionally  distinct  proteins;  one  that  binds  P-1,3  glucan  and  one  that  binds  Gram¬ 
negative  bacteria.  In  the  Ph.  papatasi  we  have  identified  three  sequences  with  high 

similarity  to  PGRPs,  two  (JP543291,  JP552580)  that  likely  represent  two  alleles  of  the  same 
sequence,  PpBGRPl,  whereas  the  third  sequence  (PpBGRP2-  JP544368)  is  most  likely  a 
transcript  fragment.  These  three  novel  pGRPs  have  amino  acid  identity  to  pGRPs  from  the 
pyralid  moth  Plodia  interpunctella  (37%)  (Fabrick,  Baker  &  Kanost  2003)  and  from  the 
mosquito  Armigeres  subalbatus  {AA%)  (Wang  et  al.  2005).  Three  sequences  (JP542600, 
JP540290,  JP541890)  with  high  similarity  to  Gram-Negative  Binding  Proteins  (GNBPs) 
were  also  identified.  No  new  sequences  with  similarity  to  pGRPs  were  identified  in  the  Lu. 
longipalpis  A3X.2&q\.,  LlBGRPl  and  L1BGRP2  having  previously  been  described  by  Dillon  et 
al.  (Dillon  et  al.  2006). 

C-type  lectins  are  a  large  protein  family  with  low  conservation  at  the  amino  acid  level,  they 
have  diverse  functions  (Zelensky,  Gready  2005),  including  pathogen  recognition  and 
neutralization  (Weis,  Taylor  &  Drickamer  1998).  We  have  identified  seven  C-type  lectins 
(PpCTLl-3)  in  the  Ph.  papatasi with  similarity  to  known  C-type  lectins  from  other 
insects.  Two  are  alleles  of  PpCTLla  (JP543952,  JP44832,  JP547369)  and  one  is  an  allele  of 
PpCTL2a  (JP543073,  JP55083).  PpCTL3a  (JP554123)  and  PpCTL3b  (JP540156)  share  a 
high  identity  at  amino  acid  level  (over  95%),  however  PpCTL3b  is  139  amino  acids  longer 
than  PpCTL3a.  There  were  no  sequences  with  high  identity  to  known  C-type  lectins 
identified  in  Lu.  longipalpis. 

Three  classes  of  the  scavenger  receptor  protein  family  serve  as  recognition  receptors  in 
immune  responses  (Peiser,  Mukhopadhyay  &  Gordon  2002,  Kiefer  et  al.  2002,  Pierini  2006, 
Ramet  et  al.  2001).  Dillon  et  al.  (Dillon  et  al.  2006)  previously  identified  proteins  belonging 
to  two  of  the  three  classes  (B  and  C)  in  Lu.  longipalpis.  Here  we  added  six  Ph.  papatasi 
sequences  with  identity  to  class  B  scavenger  receptors.  Three  of  the  new  Ph.  papatasi 
putative  scavenger  receptors  (PpSRlb-d,  JP543420,  JP543780,  JP539722)  may  represent 
alleles  of  PpSRBla  (JP541894).  PpSRBla-d  and  PpSRB2  (JP543210)  share  an  identity  of 
29-31%  to  the  known  Lu.  longipalpis  scavenger  receptors  and  44-49%  identity  to  An. 
gambiae  scavenger  receptor  class  B,  while  PpSRB3  (JP553019)  has  an  identity  of  96%  to 
NSFM-84cl  l.qlk  (SRB)  from  Lu.  longipalpis  and  80%  to  An.  gambiae  scavenger  B 
receptor. 

Galectins  have  been  implicated  in  cell  adhesion  (Ochieng,  Leite-Browning  &  Warfield 
1998),  apoptosis  (Perillo  et  al.  1997),  and  immune  response  (Tanaka  et  al.  2008)  and,  for 
sand  flies,  in  species-specific  binding  of  Leishmania  parasites  to  the  sand  fly  midgut 
(Kamhawi  et  al.  2004).  Seven  sequences  related  to  galectins  were  identified  in  Ph.  papatasi. 
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One  of  these  novel  sand  fly  galectins  is  most  likely  an  allele  of  the  known  Ph.  papatasi 
galectin  (PpGall  (JP539532),  previously  identified  as  PpGalec  [75]),  four  more  sequences 
(PpGal2a-d,  JP540648,  JP546602,  JP550066,  JP540193)  have  less  than  95%  identity  at  the 
amino  acid  level  to  PpGall,  three  of  which  (PpGal2b-d)  are  alleles  of  PpGal2a  (Figure  9A). 
The  remaining  two  galectins  (PpGal3-4,  JP5484429,  JP549531,  JP543439)  share  less  than 
39%  identity  to  PpGall.  The  novel  galectins  share  a  higher  identity  to  related  sequences 
from  Lu.  longipalpis  (PpGal2a  -  56%,  PpGal3  -  87%,  PpGal4  -  73%),  than  to  mosquito 
galectins:  An.  gambiae  (36%,  47%,  48%),  Ae.  aegypti  {AQ%,50%,  60%)  and  Cu. 
quinquefasciatus  (37%,  45%,  57%).  The  previously  published  galectins  A-D  (Dillon  et  al. 
2006)  were  identified  in  our  Lu.  longipalpis  analysis,  but  no  additional  galectins  were 
identified  (Figure  9). 

Thioester-containing  proteins  (TEPs)  are  homologs  of  the  complement  system  (Blandin, 
Levashina  2004)  and  have  been  implicated  in  phagocytosis  of  gram-negative  bacteria  in 
insects  (Cirimotich  et  al.  2010,  Tanaka  et  al.  2008,  Blandin,  Levashina  2007).  Three 
sequences  (JP540471,  JP544554,  JP555283)  with  identity  to  mosquito  TEPs  were  identified 
in  Ph.  papatasi.  One  of  these  TEP  sequences  is  a  contig  while  two  are  singlets  of  short  length 
and  likely  represent  transcript  fragments.  The  contig  shares  98%  of  its  amino  acids  with  one 
singlet  and  93%  with  the  other,  making  them  potentially  alleles  of  the  contig.  In  the  Lu. 
longipalpis  dataset,  two  sequences  related  to  mosquito  TEPs  were  identified,  one  singlet  and 
one  contig.  The  novel  Lu.  longipalpis  sequences  share  a  72%  and  86%  identity 
respectively  to  the  TEP  sequence  identified  by  Dillon  et  al.  (Dillon  et  al.  2006).  The  novel 
Ph.  papatasi  TEP  contig  has  an  identity  of  85%  and  89%  to  novel  Lu.  longipalpis  and  a  60% 
identity  to  Ae.  aegypti  TEPs.  The  Ph.  papatasi  TEP  singlets  have  and  identity  of  89%  and 
86%  respectively  for  one  singlet,  78%  and  87%  respectively  to  the  novel  Lu.longipalpisTEP 
sequences  and  69%  and  53%  identity  to  the  Ae.  aegypti  TEP  sequence. 

Signaling  pathways  proteins 

The  Toll  signaling  pathway  plays  a  key  role  in  the  establishment  of  the  dorso-ventral  axis  of 
the  Drosophila  embryo  and  also  is  activated  in  response  to  microbial  infection  (Cirimotich  et 
al.  2010,  Anderson  2000).  Immune  responses  through  this  pathway  are  activated  by  the 
recognition  of  a  pathogen  by  the  PGRPs  that  activates  a  serine  protease  cascade  culminating 
with  activation  of  the  cytokine-like  protein,  Spatzle.  Subsequent  signaling  involves  Myd88, 
Tube  and  Pelle  resulting  in  the  degradation  of  Cactus  and  the  release  of  Dorsal,  a  Rell 
protein,  from  its  complex  with  Cactus  (Anderson  2000,  Michel  et  al.  2001).  Live  sequences 
with  identity  to  mosquito  Spatzle  were  identified  in  sand  flies;  three  in  Ph.  papatasi  two 
in  Lu.  longipalpis.  Two  of  the  Ph.  papatasi  Spatzle-like  sequences  are  non-overlapping 
fragments  of  the  same  transcript.  The  two  Lu.  longipalpis  Spatzle-like  have  an  identity  of 
99%  to  each  other,  making  them  alleles  and  30%  identity  to  the  known  Lu.  longipalpis 
Spatzle  (Dillon  et  al.  2006).  The  three  novel  sand  fly  Spatzle-like  sequences  have  an  identity 
of  61%,  68%  and  68%  to  Ae.  aegypti  proteins  (Nene  et  al.  2007).  The  Ph.  papatasi 

sequence  identified  as  MyD88-like  has  45%  identity  to  Cu.  quinquefasciatusMyE^^  and  is 
most  likely  a  fragment,  as  it  is  short  (122  AA)  and  a  singlet.  The  absence  of  MyD88-like 
sequences  from  the  Lu.  longipalpis  dataset  may  be  explained  by  the  lower  number  of 
sequences  available  for  this  sand  fly.  The  four  sand  fly  homologs  to  Toll-interacting  protein 
(Tollip)  (three  in  Lu.  longipalpis  and  one  in  Ph.  papatasi)  have  a  97%  identity  to  each  other 
and  68%  to  Tr.  castaneuni  Tollip  protein.  Pellino,  another  component  of  the  Toll  pathway,  is 
known  to  bind  phosphorylated  Pelle/IRAK  and  enhance  innate  immunity  in  Drosophila  fruit 
flies  (Haghayeghi  et  al.  2010).  Here  we  identified  two  Ph.  papatasi  sequences  with  a  91% 
identity  to  the  Dr.  melanogasterEeWmo  sequence,  however  these  novel  sequences  are  from  a 
single  transcript  as  they  are  unassembled  mate  reads  (Supplemental  Table  1). 
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The  immunodeficiency  (IMD)  pathway  is  similar  to  the  mammalian  TNF  receptor  signaling 
pathway  (Valanne  et  al.  2007)(Lemaitre,  Hoffmann  2007)  and  is  activated  by  gram-negative 
bacteria  in  Dr.  melanogaster{Tandka  et  al.  2008).  Several  sequences  with  homology  to 
components  of  this  pathway  were  identified  in  Ph.  papatasi  including  five  sequences  with 
identity  to  inhibitor  of  apoptosis  protein  2  (IAP2),  a  protein  required  for  antimicrobial 
peptide  expression  in  fruit  flies  (Valanne  et  al.  2007).  Four  of  the  novel  IAP2  sequences  are 
most  likely  alleles  of  one  protein,  the  fifth  being  rather  short  (96AA)  representing  a 
transcript  fragment.  The  sand  fly  IAP2  sequences  have  an  identity  of  44%  to  Gl.  morsitans 
morsitans\AVl  sequence  (Attardo  et  al.  2006).  One  sequence  related  to  IKKp  was  also 
identified  in  Ph.  papatasi  Table  1). 

While  the  first  two  signaling  pathways  are  well  characterized  in  insect  species,  there  is  less 
information  concerning  the  c-Jun  NH2-terminal  kinase  (JNK)  pathway  in  Diptera  (Agaisse, 
Perrimon  2004).  The  JNK  signaling  pathway  is  formed  by  genes  involved  in  wound  repair, 
stress  repair,  and  negative  feedback  control  of  antimicrobial  peptides  (Botella  et  al.  2001, 
Ramet  et  al.  2002).  A  component  of  the  IMD  pathway,  TAKl,  a  MAPK,  has  been 
implicated  in  activation  of  this  signaling  cascade  (Silverman  et  al.  2003).  One  sequence 
related  to  Hem,  a  component  of  the  JNK  pathway,  was  identified  in  each  sand  fly,  but  shared 
no  significant  identity  to  each  other.  Three  Fos  sequences  were  identified  in  the  current 
study;  1  in  Lu.  longipalpis  and  two  in  Ph.  papatasi  with  an  identity  of  80%  between  the  two 
sand  flies  (the  2  Ph.  sequences  are  alleles).  Two  sequences  with  identity  to  Cu. 

quinquefasciatus  Jun  sequence,  another  JNK  pathway  component,  also  were  identified  in 
sand  flies  (one  each)  (Supp  Table  1). 

Effector  proteins 

In  insects,  prophenoloxidases  (PPOs)  are  involved  in  melanization,  an  efficient  immune 
response  activated  by  recognition  of  lipopolysacharides,  peptidoglycan,  and  P-1,3  glucans 
(Soderhall,  Cerenius  1998,  Cerenius,  Soderhall  2004).  Ten  sequences  related  to  PPOs  were 
identified  in  Ph.  papatasi.  Only  four  of  these  ten  sequences  have  high  identity  (blastp,  10"^*^) 
to  related  PPO  sequences  from  other  organisms.  The  four  PPOs  identified  here  most  likely 
represent  three  unique  transcripts,  with  PpPPOl  having  two  alleles  (PpPPOla  (JP539467) 
and  PpPPOlb  (JP550019)).  PpPP01a,b  and  PpPP02  (JP549395)  have  an  identity  of  44%, 
52%  and  63%  respectively  to  Ae.  aegyptiWO  (XP_001648968.1),  while  PpPP03 
(JP545596)  has  an  identity  of  64%  to  a  different  Ae.  aegyptiWO  (XP_001661891.1).  The 
reduced  level  of  identity  of  the  last  six  putative  PPOs  from  Ph.  papatasi  could  be  due  to  their 
short  length  (the  determined  ORF  encodes  a  protein  less  than  300  amino  acids).  Thirteen 
sequences  with  similarity  to  PPOs  were  identified  in  Lu.  longipalpis,  of  these  sequences, 
four  are  likely  alleles  of  one  PPO  (LlPPOl)  and  three  are  alleles  of  a  second  PPO  sequence 
(L1PP02),  with  LlPP02c  allele  being  formed  by  an  unassembled  mate  pair.  The  other  six 
PPOs  present  in  Lu.  longipalpis  share  an  identity  of  less  than  64%  at  amino  acid  level  to  the 
previously  mentioned  Lu.  longipalpis  (LlPPOl a-d  and  LlPP02a-c).  The  novel 
putative  Ph.  papatasi  PPO  have  a  varying  degree  of  identity  to  their  closest  Lu.  longipalpis 
counterpart;  PpPPOla  -78%,  PpPP02  -  41%  and  PpPP03  -  37%. 

Another  type  of  protein  implicated  in  immune  response  in  mosquitoes  has  leucine-rich 
repeats  (LRR)  and  in  An.  gambiae  is  involved  in  bacteria  phagocytosis  and  parasite 
melanization  (Povelones  et  al.  2009).  Here  we  have  identified  five  LRR-containing 
sequences  in  Ph.  papatasi,  two  of  which  are  likely  alleles  of  the  same  gene,  and  two  Lu. 
longipalpis  sequencer,  with  an  amino  acid  identity  of  46  and  48%  to  Ae.  aegyptiwd  68%  to 
Gl.  morsitans  morsitans  LRR-  containing  proteins.  Lysozymes  have  been  implicated  in  both 
digestion  and  immune  response  in  mosquitoes  (Ursic  Bedoya  et  al.  2005).  We  identified  two 
different  lysozyme  sequences  from  Ph.  papatasi  With  identity  to  known  insect  lysozymes. 
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Peritrophic  matrix  (PM) 

The  PM  is  a  non-membranous  extracellular  layer  that  surrounds  the  food  bolus  in  insects.  To 
date  it  has  been  best  characterized  in  insects  of  medical  or  agricultural  importance.  In 
phlebotomine  sand  flies,  there  are  two  different  types  of  peritrophic  matrices  depending  on 
the  life  stage  of  the  fly;  PM  type  1  (PMl)  is  present  during  the  adult  life  stage  while  PM 
type  2  (PM2)  is  present  during  larval  life  stages  (Marquardt  et  al.  2005).  The  synthesis  and 
degradation  of  PMl  has  been  related  to  the  ingestion  of  blood  in  Lutzomyia  sand  flies,  with 
the  PM  being  well  formed  24h  after  a  blood  meal  (Secundino  et  al.  2005).  Functionally, 

PMl  has  been  implicated  in  both  digestion  (Tellam,  Wijffels  &  Willadsen  1999,  Shao, 
Devenport  &  Jacobs-Lorena  2001)  and  in  Leishmania-^,wd  fly  interaction.  Glutamine 
synthetase  is  involved  in  chitin  synthesis,  a  major  component  of  PM.  We  identified  nine  Ph. 
papatasi  sequences  related  to  glutamine  synthetase  with  identity  to  sequences  from  other 
flies  (56-96%  Lu.  longipalpis;ll-%\%.  Dr.  melanogaster;ll%  Gl.  niorsitans morsitans; 
70-75%,  Ae.  aegypti),  six  of  which  may  represent  alleles  of  one  sequence. 

Three  sequences  with  identity  to  Dr.  melanogasterhemomudm,  a  molecule  present  in  the 
fruit  fly  PM  and  involved  in  induction  of  antibacterial  effector  molecules  (Theopold  et  al. 
1996)  were  identified  in  Ph.  papatasi,  these  sequences  likely  represent  a  single  transcript 
with  one  sequence  containing  the  signal  peptide  and  the  other  two  containing  all  the 
conserved  glycosylation  sites.  Integrins  are  involved  in  cell-cell  and  cell-matrix  interactions 
and  have  been  implicated  in  phagocytosis  of  bacteria  in  An.  gambiae  (Moita  et  al.  2006). 
Five  Ph.  papatasi  sequences  were  identified  with  high  identity  to  p-integrins  from  insects 
with  an  amino  acid  identity  between  53%  for  Ae.  aegypti  md  91%  to  Plutella  xylostella.  The 
length  of  these  sequences  combined  with  the  fact  they  are  either  singlets  or  single  mate  pairs 
suggests  that  they  might  be  fragments  of  p-integrin  present  in  Ph.  papatasi.  Peritrophins  are 
important  components  of  insect  PM  (Lehane  1997),  we  identified  one  novel  sequence  with 
71%  identity  to  Dr.  melanogaster^ents:o^\vm  A  and  25%  identity  to  known  Ph.  papatasi 
peritrophins  (Ramalho-Ortigao  et  al.  2007). 

Quantitative  mRNA  anaiysis  of  candidate  genes 

Fourteen  novel  Ph.  papatasi  were  selected  for  an  expression  analysis.  The 

sequences  included  the  novel  putative  trypsin  and  chymotrypsin  and  PGRP  proteins 
sequences  were  chosen  for  their  potential  role  in  digestion  and  immune  response.  The 
expression  of  three  previously  described  transcripts  was  also  analyzed  {PpTrypl,  PpChym2 
and  PpPGRF)  in  order  to  validate  our  results. 

Of  the  novel  putative  Ph.  papatasi PpChym4a-12,  PpTryp5a  and  PpPGRP-SCl 
(Figure  10)  are  expressed  at  a  higher  level  in  the  immature  life  stages,  indicating  that  they 
may  be  involved  in  digestion  as  was  shown  for  larval  and  pupal  trypsin  and  chymotrypsin  in 
Ae.  aegypti  (Yang,  Davies  1971),  where  their  roles  were  thought  to  be  in  food  digestion  and 
immune  response,  respectively.  Previously,  the  expression  of  PpChym2  was  shown  to  be 
influenced  by  the  presence  of  a  blood  meal  (Ramalho-Ortigao  et  al.  2003),  in  this  study  we 
demonstrate  that  while  PpChym2  is  expressed  both  in  adult  and  immature  life  stages  (Figure 
10)  it  has  a  much  higher  level  of  expression  in  late  larval  stages  than  upon  blood  feeding 
(Figure  10).  Unlike  PpChym2,  PpTrypl  is  not  expressed  in  the  larval  or  pupal  stages  and,  as 
demonstrated  by  Ramalho-Ortigao  et  al.  (Ramalho-Ortigao  et  al.  2003)  is  down  regulated 
upon  blood  meal  (results  not  shown). 

In  blood  feeding  mosquitoes,  there  are  two  types  of  chymotrypsins  that  have  their 
expression  level  influenced  by  taking  a  blood  meal:  early  chymotrypsin  which  is  highly 
expressed  for  the  first  hours  post  blood  meal  in  An.  gambiae  (Shen,  Edwards  &  Jacobs- 
Lorena  2000)  and  late  chymotrypsin  which,  in  An.  gambiae  has  a  peak  expression  at  around 
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30h  post  blood  meal  (Vizioli  et  al.  2001).  One  late  chymotrypsin  has  been  previously 
identified  in  Ph.  papatasi-  PpChym2,  with  a  peak  level  of  expression  at  30h  post  blood 
meal  (Ramalho-Ortigao  et  al.  2003).  Here  we  have  identified  two  other  novel  putative 
chymotrypsins  that  have  an  expression  post  blood  meal  -  PpChyml3  and  PpChyml4a, 
indicating  a  possible  implication  of  these  proteins  in  blood  digestion.  While  PpChyml4a  is 
also  expressed  in  immature  stages,  though  at  much  lower  levels,  PpChyml3  is  expressed 
only  upon  blood  feeding  (Figure  11  A&B). 

PpPGRP  and  PpPGRP2  are  two  other  Ph.  papatasi  potentially  involved  in  immune 

response  whose  expression  was  analyzed  in  this  study.  The  previously  identified  PpPGRP 
(Ramalho-Ortigao  et  al.  2007,  Jochim  et  al.  2008),  is  shown  here  to  be  expressed  in  both 
immature  and  adult  life  stages,  with  a  peak  of  expression  during  larval  stages,  suggesting  a 
possible  involvement  in  larval  immune  response  (Figure  12  A&C).  PpPGRP2  has  an 
expression  profile  in  the  larval  stages  similar  to  the  PpPGRP  with  a  peak  of  expression 
during  4'^*’  intar  (L4).  However,  this  transcript  is  also  up  regulated  following  a  blood  meal 
with  or  without  (Figure  12  B&D). 

Of  the  71  Ph.  sequences  addressed  in  the  study,  30  sequences  share  over  95% 

amino  acid  identity  with  another  Ph.  papatasi  sequence,  possibly  representing  allele 
variation.  The  high  number  of  alleles  present  is  suggestive  of  high  genetic  heterozygosity 
within  the  colony  utilized  for  library  construction  in  spite  of  the  fact  that  this  colony  has 
been  maintained  under  laboratory  conditions  since  the  early  1970s  with  substantial 
inbreeding.  Furthermore,  we  developed  an  assembly  process  that  favored  a  more  compacted 
assembly  than  otherwise  possible,  indicating  the  possibility  of  gene  duplication  in  Ph. 
papatasi.  The  existence  of  a  high  degree  of  polymorphism  in  the  population  is  possible  but 
further  tests  are  necessary  for  confirmation. 

To  estimate  colony  genetic  heterozygosity  we  quantified  the  number  of  SNPs  present  in 
each  contig,  requiring  that  each  qualifying  position  was  covered  by  at  least  four  ESTs  and 
that  the  two  most  common  alleles  were  present  in  at  least  two  ESTs.  Using  these  criteria, 
SNPs  were  discovered  in  1285  contigs,  comprising  7.1  Mb  of  sequence.  We  identified  6312 
SNPs  and  estimated  8.88  SNPs  per  1,000  bases.  This  number  is  slightly  greater  than  the 
SNP  density  for  colonized  An.  /u/testo  mosquitoes  (Wondji,  Hemingway  &  Ranson  2007) 
but  lower  than  that  found  in  the  butterflies  Melitaea  cinxia  (Vera  et  al.  2008)  and  Papilio 
Zelicaon  (O'Neil  et  al.  2010),  indicating  a  relatively  low  level  of  heterozygosity. 

Conclusions 

This  work  represents  the  first  global  transcriptome  study  of  the  sand  fly  Ph.  papatasi,  the 
principal  vector  of  Le.  major  'm  North  Africa  and  the  Middle  East,  and  has  resulted  in  the 
identification  of  novel  sequences  involved  in  parasite-vector  interactionthat  could  be  targets 
for  future  vector  control  methods.  Eurthermore,  this  EST  library  will  be  an  essential  resource 
during  annotation  of  the  two  phlebotomine  sand  fly  genomes  currently  under  way 
(McDowell  et  al.  2006). 

Experimental  procedures 

Library  construction 

A  normalized  cDNA  library  (Soares  et  al.  2009)  was  constructed  from  a  Ph.  papatasi  colony 
(Israeli  strain)  maintained  at  the  Walter  Reed  Army  Institute  of  Research  (WRAIR).  This 
colony  was  originally  established  in  the  1970's  and  has  been  subjected  to  several 
bottlenecks,  thus  it  is  thought  to  display  low  levels  of  genetic  polymorphism.  Using  a 
RNAeasy  Mini  Kit  (Qiagen),  total  RNA  was  collected  from  the  four  larval  instars,  pupae. 
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and  adult  males  and  females  1,  3,  and  10  days  post  emergence  (adults  fed  only  on  sugar). 
Additional  RNA  was  included  from  females  6,  12,  24,  36,  48,  72,  94  and  120  hours  post 
feeding  on  uninfected  and  Le.  /Tjayor (strain  Friedlin  VI)  infected  mouse  blood.  A  total  of  29 
samples  were  collected  with  RNA  amounts  per  sample  ranging  from  700ng  (F*^  instar  larvae 
or  LI)  to  20p,g  (48h  post  blood  meal,  single  females).  RNA  samples  were  checked  on 
agarose  gel  (1%  MOPS  with  5%  formaldehyde).  Library  construction  was  completed  with 
pooledRNA,  using  equal  amounts  from  each  of  the  29  samples,  by  Express  Genomics 
(Baltimore,  MD)  and  EST  sequencing  was  performed  by  the  Genome  Institute  at 
Washington  University  (St.  Louis,  MO). 

Sequencing,  processing,  and  assembiy 

Sequences  were  generated  from  the  normalized  cDNA  library  using  the  Sanger-based 
dideoxy  chain  termination.  Direct  colony  sequencing  using  standard  high  throughput 
sequencing  methods  was  performed  using  a  DNA  track  Robot  and  ABI  3730  sequencers.  Ph. 
papatasi  sequencing  chromatograms  were  used  to  generate  sequence  and  quality  score  files 
using  Phred/Phrap  program  suite  (Ewing  et  al.  1998,  Ewing,  Green  1998).  Lu.  longipalpis 
sequences  and  quality  scores  were  obtained  in  Easta  format  from  Dillon  et  al.  (Dillon  et  al. 
2006).  Cleaning  and  filtering  of  sequence  files  was  performed  in  three  steps.  Eirst, 
contaminant  sequences  (human,  protozoa,  bacteria,  mouse  or  rat)  were  removed  using  the 
BLASTX  algorithm  (Altschul  et  al.  1990).  A  sequence  was  considered  to  be  a  contaminant 
if  12  of  the  first  20  best  hits  were  any  of  the  organisms  listed  above.  Subsequently  low 
quality  regions  were  trimmed  using  Lucy2  (Chou,  Holmes  2001)  with  default  parameters 
and  the  pExpress-1  cloning  vector.  Lastly,  poly(A)  tails  and  small  contaminant  sequences 
were  removed  using  the  Seqclean  algorithm  available  from  The  Institute  for  Genomic 
Research  (TIGR  Gene  Indices  Clustering  Tools)  (Chen  et  al.  2007).  Only  sequences  longer 
than  100  bp  were  used  for  further  analysis.  As  a  comparison,  we  also  analyzed  a  previously 
generated  Lu.  longipalpis¥ST  data  set  (Dillon  et  al.  2006).  To  eliminate  sequence  cleaning 
and  assembly  biases,  the  Lu.  longipalpis  data  (first  described  by  Dillon  et  al.  (Dillon  et  al. 
2006)  were  processed  using  the  same  programs  and  parameters  as  the  Ph.  /tapa/asf  dataset. 

Sequences  were  assembled  into  contigs  using  the  Cap3  assembler  (Huang,  Madan  1999). 
Overlapping  mate  pairs  were  assembled  and  consensus  obtained  using  relatively  relaxed 
criteria  (p  80,  o  20,  h  95).  Consensus  sequences  obtained  above  were  assembled  with 
remaining  ESTs  using  more  stringent  parameters  (p  95,  o  50,  h  30)  (Eigure  1).  The  resulting 
sequences  are  referred  to  as  assembled  sequences. 

GO  annotation  and  similarity  searches 

Biological  functions  for  the  Ph.  papatasi  and  Lu.  longipalpis  assembled  sequences  were 
assigned  using  Blast2GO  (Conesa  et  al.  2005).  InterProScan  analysis  also  was  performed  as 
part  of  the  GO  annotation  process.  Annotated  genes  were  split  into  the  three  main  GO 
categories:  biological  process,  molecular  function  and  cellular  component. 

Similarity  to  known  proteins 

Similarity  searches  to  known  sequences  were  performed  by  using  BLAST  with  an  e-value 
limit  of  10"^  against  the  National  Center  for  Biotechnology  Information  (NCBI)  non- 
redundant  protein  database  (NR).  Protein  sequences  involved  in  vector-parasite  interactions, 
specifically  sequences  involved  in  blood  digestion,  immune  response,  and  peritrophic  matrix 
composition  were  downloaded  from  NCBI  (http://www.ncbi.nlm.nih.gov/protein)  and 
ImmunoDB  (http://cegg.unige.ch/Insecta/immunodb/)  for  available  arthropod  species.  Ph. 

sequences  with  significant  similarity  (BLASTX,  e-value  10‘^°)  to  these  restricted 
datasets  were  further  analyzed  for  presence  of  conserved  domains  using  ScanPROSITE 
(Gattiker,  Gasteiger  &  Bairoch  2002)  and  manual  searches.  Eor  manual  domain  searches. 
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sand  fly  assembled  sequences  were  translated  into  amino  acid  sequences  using  prot4EST 
pipeline  (Wasmuth,  Blaxter  2004)  and  then  aligned  to  known  proteins  using  Muscle  (Edgar 
2004)  with  default  parameters.  Alignments  were  viewed  using  Seaview  (Gouy,  Guindon  & 
Gascuel  2010)  and  Jalview  (Waterhouse  et  al.  2009).  Maximum  likelihood  trees  were  built 
for  full  sand  fly  sequences  using  sequences  from  related  organisms  and  the  MEGA5 
algorithm  (using  the  WAG  model,  bootstrap  value;  1000)  (Tamura  et  al.  2011). 

Expression  analysis  of  predicted  Ph.  papatasi  proteins 

An  expression  analysis  was  performed  on  14  novel  putative  Ph.  papatasi 'pmX.ems  involved 
in  digestion  and  immune  response  (1  trypsin,  1 1  chymotrypsin,  2  PGRP).  Total  RNA  was 
isolated  from  5  different  Ph.  papatasi\&.TS2X  stages  (Ll-P)  females  and  males  1-3  days  old 
and  blood  fed  females  at  24h,  using  the  RNeasy  Mini  Kit  (Qiagen).  Eor  proteins  with  a  high 
expression  at  24h  post  blood  meal  additional  samples  were  extracted  from  females  at  3h, 

36h  and  72h  post  blood  meal  and  post  Le.  /nayor  infected  blood  meal.  The  DNase 
(Fermentas)  treated  RNA  was  used  to  generate  cDNA  using  Superscript  III  (Invitrogen)  and 
oligo  (dTj2-2o)-  Quantitative  PCR  was  performed  using  SYBRGreen  (ABI),  an  ABI  7900 
RT-PCR  system  and  20ng  of  cDNA  per  sample.  Primer  sets  were  designed  for  each 
sequence  of  interest  such  that,  only  one  sequence  was  amplified  (Supplemental  Table  3), 
however,  high  level  of  identity  between  the  potentially  different  alleles  made  it  impossible 
to  distinguish  between  them.  The  60S  Ribosomal  protein  LIO  was  used  as  an  internal 
control.  Reactions  for  each  gene  and  for  the  control  used  were  carried  out  in  triplicate. 
Relative  expression  levels  of  each  gene  was  determined  by  the  AACy  method,  where 
relative  expression  is  expressed  as  a  fold  difference  relative  to  sugar  fed  females  and 
expressed  as  2"  following  formula  was  used:  A ACj  =  ACx(stage  or  condition)  ‘ 

■'^^T(Sugar  Fed  Females)  ^^id  ACj  =  Cx  (gene  of  interest)  "  ^x  (60S  RNA)-  Average  Ct  value  for  all 
samples  can  be  found  in  supplemental  tables  4  and  5. 

Heterozygosity  Analysis — Single  nucleotide  polymorphisms  (SNPs)  were  generated 
from  the  sandfly  EST  assemblies  as  described  by  O'Neil  et  al.  (2010).  Specifically,  the  ACE 
output  of  each  CAP3  assembly  was  imported  into  an  AMOS  bank  (Pop  et  al.  2004)  for 
programmatic  access  to  underlying  read  information.  SNPs  were  called  using  the  “loose” 
criterion,  which  required  that  the  two  most  common  alleles  be  found  in  at  least  two  distinct 
ESTs  (O'Neil  et  al.  2010).  To  estimate  heterozygosisity,  we  used  AMOS  to  count  both  the 
number  of  loose  criterion  SNPs  and  the  number  of  positions  covered  by  at  least  four  ESTs  in 
each  contig.  Average  heterozygosity  was  then  computed  as  the  total  number  of  SNPs 
divided  by  total  number  of  qualifying  positions.  This  simple  method  was  chosen  over  the 
Beta  statistic  (Novaes  et  al.  2008)  because  we  were  not  interested  in  population  genetics 
underlying  these  two  colonies  and  highly  covered  arthropod  EST  contigs  tend  to  be  less 
diverse  (O'Neil  et  al.  2010)  justifying  the  use  of  more  simple  SNP  criteria. 

Supplementary  Material 

Refer  to  Web  version  on  PubMed  Central  for  supplementary  material. 
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Figure  1.  Assembly  process  flowchart 

Boxes  indicate  a  state  of  the  sequences  in  the  pipeline  while  italic  lettering  indicates 
modifications  applied  to  the  sequences  and  arrow  indicates  the  sense  of  the  sequences 
movement  down  the  pipeline 
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A  C 


B  D 


Figure  2.  Assembled  sequence  length  distribution 

Distribution  of  sequence  length  for  sequences  with  NR  or  InterPro  hits  and  for  sequences 
with  NR  or  InterPro  hits  and  GO  annotation  for  Ph.  papatasi  (A,B)  and  Lu.  longipalpis  (C,D) 
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Figure  3.  Gene  Ontology  terms  distribution 

Distribution  of  Ph.  papatasi  (inner  circle)  and  Lu.  longipalpis  (outer  circle)  sequences  for  the 
three  main  GO  categories:  Biological  process  (A),  Molecular  Function(B)  and  Cellular 
Component  (C). 
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Figure  4.  Ph.  papatasi  novel  trypsin  sequences 

(A)  Novel  Ph.  papatasi  mvMxpXs  sequence  alignment;  conserved  residues  are  represented  by 
a  darker  shading  while  mismatches  between  the  five  sequences  are  indicated  by  boxes.  Full 
sequence  alignment  is  available  in  supplemental  materials  (S5).  (B)  Phylogenetic  analysis  of 
trypsin  amino  acid  sequences  from  Ph.  papatasi  (Pp-.  PpTry5a-e  (JP544502,  JP542407, 
JP540627,  JP554453,  JP544448),  AAM96940. 1 ,  AAM96941.1,  AAM96942.1, 
AAM96943.1),  Lu.  longipalpis  {U:  ABM26904.1,  ABM26905.1,  ABV60308.1, 
ABV60300.1),  An.  gambiae{Kg:  CAA80512.1,  CAA79328.1,  CAA80517.1,  CAA80515.1, 
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CAA80514.1,  CAA80513.1,  CAA80516.1),  An.  Stephens! AAB66878.1, 
AAA97479.1),  Ae.  aegypti (Aa:  EAT40684.1,  EAT42808.1,  EAT36350.1,  EAT34033.1, 
EAT42007.1,  EAT42008.1,  EAT42004.1,  EAT37859.1),  Cu.  quinquefasciatus  (Cq: 
EDS34988.1)  and  Cu.  pipiens palens  (Cpp:  AAK67462.1).  The  WAG  substitution  model 
was  used  with  variable  positions  and  a  bootstrap  value  of  1000  (only  those  above  50  are 
represented  on  the  trees).  The  scale  represents  the  rate  of  amino  acid  substitution  per  site. 
The  novel  Ph.  papatasi  Xxy'psm  sequences  (PpTryp5a-e)  are  indicated  in  bold. 
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Figure  5.  Ph.  papatasi  novel  chymotrypsin  sequences 

(A)  Multiple  sequence  alignment  fragment  depicting  the  chymotrypsin  binding  pocket, 
where  the  pockets  with  the  P-to-A  substitution  are  boxed;  the  residue  concordance  at  each 
position  in  indicated  by  the  different  degree  of  shading  at  that  position.  Full  sequence 
alignment  is  available  in  supplemental  materials  (S6).  (B)  Phylogenetic  analysis  of  known 
and  novel  chymotrypsins  from  Ph.  papatasi  PpChym4-14  (JP546634,  JP554565, 
JP551370,  JP547341,  JP554731,  JP549601,  JP540150,  JP554746,  JP554644,  JP540516, 
JP543908,  JP549141,  JP547864,  JP543702,  JP542007,  JP548909,  JP549999), 
AAM96938.1,  AAM96939.1,  ABV44728.1),  Lu.  longipalpis  {U:  ABV60294.1, 
ABV60293.1,  ABV60309.1,  ABV60291.1,  ABV60292.1,  ABV60301.1),  An.  gambiae  (Ag; 
CAA83568,  CAA83567),  An.  darling!  (Ad;  ADD17493.1,  ADD17494.1),  Ae.  aegypti{Aa-. 
XP_00 166306 1.1,  EAT32679.1,EAT38422.1,  AAL93243.1),  Cu.  quinquefasciatus  {Cq-. 
XP_00 1846630.1,  XP_00 1865429.1,  XP_00 11 863473.1),  Ch.  nparius(Cr.  ACE19792.1), 
GI.  morisitans  morisitans  {Gmm-.AXyD\?311  .\),  He.  armigera  ADI32883.1, 
ADI32881.1),  Ma.  sexta{Ms:  CAL92020.1,  CAM84317.1,  CAM84318.1,  CAM84319.1), 
Sp.  exigua  (Se:  AAX35812.1)  and  Te.  molitor  (Tm:  ABC88746.1).  The  WAG  substitution 
model  was  used  with  variable  and  invariable  positions  and  a  bootstrap  value  of  1000  (only 
those  above  50  are  represented  on  the  trees).  The  scale  represents  the  rate  of  amino  acid 
substitution  per  site.  The  novel  Ph.  papatasi chymotrypsm  sequences  are  in  bold. 
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Figure  6.  Phylogenetic  analysis  of  Ph.  papatasi  and  Lu.  longipalpis  aminopeptidase  L 
Philogenetic  tree  constructed  using  aminopeptidase  L  sequences  from  Ph.  papatasi  (Pp: 
PpAPLla  (JP541859)),  Lu.  longipalpis  (LI:  LlAPLl,  LlAPL2a),  Ar.  thaliana  (At: 
CAA45040.1),  Bo.  taurusi^V.  AAB28170.1),  Gl.  morisitans inorisitans {Gmm\ 
ADD18517.1),  Ae.  aeygpti{A&\  EAT48532.1,  EAT47208.1,  EAT45789.1),  Cu. 
quinquefasciatusiCq.  XP_00 1866727.1,  XP_001844372.1,  XP_00 185 1548.1),  Le. 
aniazonensis  (La:  AAL16095.1),  Le.  infantum  (Li:  CAM68214.1),  Le.  major{Lm: 
XP_001683430.1),  Homo  sapiens  {Ws:  AAD17527.1)  and  Bo.  mori(&m:  NP_001 108470.1). 
The  WAG  substitution  model  was  used  with  variable  and  invariable  positions  and  a 
bootstrap  value  of  1000  (only  those  above  50  are  represented  on  the  trees).  The  scale 
represents  the  rate  of  amino  acid  substitution  per  site.  Novel  sand  fly  aminopeptidase  L 
sequences  are  in  bold. 
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Figure  7.  Phylogenetic  analysis  of  Ph.  papatasi  and  Lu.  longipalpis  carboxypeptidase  B 
Phylogenetic  tree  constructed  using  carboxypeptidase  B  sequences  from  Ph.  papatasi  (Pp: 
PpCarbB2  (JP542144),  PpCarbBS  (JP546271),  ABV44754.1),  Lu.  longipalpis  {U: 

LICarbB),  An.  gambiae{Ag:  AAS99341.1,  CAF28572.1),  An.  stephensi (As: 

ADD31639.1),  Ae.  aeygpti(A&:  AAT36733.1,  AAT36732.1,  ABO21077.1),  Cu. 
quinquefasciatusiCq.  XP_001856154.1,  EDS34658.1,  XP_00 1856 164.1),  Oc.  epactius  (Os: 
AAT36738.1).  The  WAG  substitution  model  was  used  with  variable  and  invariable  positions 
and  a  bootstrap  value  of  1000  (only  those  above  50  are  represented  on  the  trees).  The  scale 
represents  the  rate  of  amino  acid  substitution  per  site.  Novel  Ph.  carboxypeptidase 

B  sequences  are  in  bold. 
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Figure  8.  Phylogenetic  analysis  of  Ph.  papatasi  PGRP  sequences 

Phylogenetic  analysis  of  known  and  novel  Ph.  papatasi  PGRP  sequences  from  Ph.  papatasi 
(Pp:  ABV60369.1,  PpPGRP2  (JP540873),  PpPGRP-SCl-3  (JP551327,  JP547206, 
JP546057),  Lu.  longipalpis  {U:  ABV60332.1),  Bo.  mon(Pm-.  BAA77209.1),  Ca. 
dromedaries  {Q.A\  CAC19553.1),  Dr.  melanogaster  (Dm:  AAF54643.1,  AAG32064.1, 
AAG23735.1,  AAG23736.1),  Mu.  musculus  (Mm:  AAC31821.1),  Ra.  norvegicus  (Pm: 
AAF73252.1)  and  Tr.  ni(Tn:  AAC31820.1).  The  WAG  substitution  model  was  used  with 
variable  positions  and  a  bootstrap  value  of  1000  (only  those  above  50  are  represented  on  the 
trees).  The  scale  represents  the  rate  of  amino  acid  substitution  per  site.  Novel  Ph.  papatasi 
PGRP  sequences  are  in  bold. 
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Figure  9.  Ph.  papatasi  novel  galectin  sequences 

Multiple  sequence  alignment  of  five  novel  Ph.  papatasi  (PpGallb  (JP539352), 

PpGal2a-d  (JP540648,  JP546602,  JP550066,  JP540193)  and  the  known  galectin  sequence 
(AATl  1557.1)  from  the  same  organism.  Darker  shading  indicates  conservation  at  each 
position. 
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Life  stage 


Figure  10.  Ph.  papatasi  novel  proteins  transcript  levels 

qPCR  analysis  of  Ph.  papatasi  no\e\  putative  genes  at  different  life  stages,  including  larvae 
(L1-L4),  pupa  (P),  1-3  days  past  emergence  adult  females  (F)  and  males  (M)  and  adult 
females  at  24h  post  blood  meal  (BF)  for  (A)  PpChym4a  (JP546634),  (B)  PpChymS 
(JP551370),  (C)  PpChymb  (JP547341),  (D)  PpChym7  (JP554731),  (E)  PpChymSa 
(JP549601),  (F)  PpChym9a  (JP554746),  (G)  PpChymlO  (JP540516),  (H)  PpChymlla 
(JP543908),  (I)  PpChyml2  (JP547864),  (J)  PpChym2  (AY128107),  (K)  PpTrypSa 
(JP544502)  and  (L)  PpPGRP-SCl  (JP551327). 
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Figure  11.  PpChyml3  and  PpChyml4  transcript  levels 

qPCR  analysis  of  2  Ph.  papatasino\t\  putative  chymotrypsins  at  different  life  stages, 
including  larvae  (L1-L4),  pupa  (P),  1-3  days  past  emergence  adult  females  (F)  and  males 
(M)  and  adult  females  at  24h  post  blood  meal  (BF)  for  (A)  PpChymlS  (JP543702)  and  (B) 
PpChyml4a  (JP542007).  Additional  blood  feeding  conditions  for  adult  female  sand  flies  at 
3h,  36h  and  72h  post  blood  meal  (BF)  and  post  Le.  /n^'or  infected  blood  meal  (INF)  for  (C) 
PpChyml3  and  (D)  PpChyml4a. 
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Figure  12.  PpPGRP  and  PpPGRP2  transcript  levels 

qPCR  analysis  of  2  Ph.  papatasi  novt\  putative  PGRPs  at  different  life  stages,  including 
larvae  (L1-L4),  pupa  (P),  1-3  days  past  emergence  adult  females  (F)  and  males  (M)  and 
adult  females  at  24h  post  blood  meal  (BF)  for  (A)  PpPGRP  (EU130784)  and  (B)  PpPGRP2 
(JP540873).  Additional  blood  feeding  conditions  for  adult  female  sand  flies  at  3h,  36h  and 
72h  post  blood  meal  (BF)  and  post  Le.  /Tj^'or  infected  blood  meal  (INF)  for  (C)  PpPGRP 
and  (D)  PpPGRP2. 
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Table  1 

Assembly  results  for  Phlebotomus papatasi  and  Lutzomyia  longipalpis  at  all  steps  of  the  assembly  process 
presented  in  Fig.  1 . 


Assembly  steps 

Phlebotomus  papatasi 

Lutzomyia  longipalpis 

Initial  sequences 

Total 

47,615 

27,928 

Normalized 

47,615 

26,495 

Non-normalized 

N/A 

1,433 

Filtering 

Passed 

47,227 

27,863 

Failed 

388 

65 

Cleaning  and  trimming 

Lucy 

37,708 

24,102 

SeqClean 

37,487 

24,019 

First  tier  assembly 

Contigs 

7,683 

6,049 

Singlets 

22,121 

11,101 

Second  tier  assembly 

Contigs 

6,187 

5,063 

Singlets 

10,933 

4,963 

Total 

17,120 

10,026 

Sequences  longer  than  200  nucleotides 

16,265 

N/A 
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